Predicting Survival Probability of NASA Aircraft Engines

Using Cox Proportional Hazards Model

Jayme Reed & Brad Paton (Advisor: Dr. Cohen)

April 22, 2025

Cox Proportional Hazards (CPH) Model

What is it?

  • A statistical regression method specializing in modeling time-to-event predictions with survival data (Abeysekera and Sooriyarachchi 2009)

    • Survival data has a value for time and an indicator column for an event
  • Is a method that can deal with censored data

    • Censored data is when the information about an individual in a study is only known for a certain period of time (Klein and Moeschberger 2005)
  • Primarily used in the health field but has applications in predicting bank failure, the survival probability of machines, and insurance likelihood payouts

Limitations

Mathematical Formulas

  • Concordance Index: \(C = \frac{c + \frac{t_x}{2}}{c + d + t_x}\)

  • CPH Model Hazard Function: \(h(t|\mathbf Z) = h_0(t)\text{exp}(\sum\limits_{k=1}^{p} \beta_kZ_k)\)

  • Proportional Hazards Ratio: \(\frac{h(t|\mathbf Z)}{h(t|\mathbf Z*)} = \text{exp}[\sum\limits_{k=1}^{p} \beta_k(Z_k - Z_k^*)]\)

  • Cumulative Hazard Function: \(H(x) = \int_0^x h(u) du\)

  • Survival Probability: \(S(t) = e^{-H(t)}\), where \(H(t)\) is the above cumulative hazard function

Assumptions

There are four assumptions for CPH:

  • Independence assumption

    • Assumes that the survival times of observed subjects are independent of each other (Nahhas 2025)
  • Non-informative Censoring Assumption

    • Assumes that censoring is non-informative (Nahhas 2025)
  • Linearity Assumption

    • Assumes the relationship between covariates and the outcome is a linear relationship (Nahhas 2025)
  • Proportional Hazards Assumption

    • Assumes the ratio of hazards rates for any two subjects are constant at all times (Bustan 2018)

Evaluation and Survival Probability

  • Model accuracy evaluation is done using the concordance index

    • The concordance index measures the amount of agreement between two variables

    • A value of 1 means all the pairs are correctly ordered while a value of 0 means no pairs are correctly ordered

  • Survival probability can be predicted at a specific time \(t\)

    • If the probability is \(\geq 50\)%, it is assumed the event has not occurred

    • If the probability is \(< 50\)%, it is assumed the event has occurred

Data Structure

The data selected for this project comes from a study on propagation modeling that NASA completed, specifically focusing on the engine two testing and training datasets (Saxena et al. 2008).

  • Each engine in the NASA data has an unknown amount of wear, manufacturing variation, and sensor noise

  • There are three operation setting fields and twenty-one sensor measurement fields

  • A column indicating status was added to both the testing and training datasets with 0 indicating the machine has not failed and 1 indicating the machine has failed

NASA Aircraft Engine Data
id time status os1 os2 os3 sm1 sm2 sm3
1 149 1 42.0017 0.8414 100 445 550.49 1366.01
2 269 1 42.0047 0.8411 100 445 550.11 1368.75
3 206 1 42.0073 0.8400 100 445 550.80 1356.97

Exploration

  • 519 engines in the combined data

    • 260 engines in training data

    • 259 engines in testing data

Summary Metrics
Metric Value
Minimum 128.00
Median 199.00
Mean 206.77
Standard Deviation 46.78
Maximum 378.00

Visualization

Creating CPH Model

The table provides the model number, the covariates used, the AIC and BIC from the stepwise regression if applicable, and the concordance index.

Model Covariates AIC BIC Concordance
model 1 All covariates 2449.92 2547.71 0.6956658
model 2 os3, sm3 - sm5, sm8 - sm9, sm15 - sm18, sm20 2431.88 N/A 0.6917216
model 3 os3, sm3 - sm4, sm8-sm9, sm15, sm18 N/A 2461.31 0.6833401
model 4 os3, sm13 - sm14, sm19 N/A N/A 0.5881404

Model for the continuing analysis will have all covariates except sm16 and sm19.

CPH Model Forest Plot

Checking Assumptions

  • This assumption is met as the survival times for each individual engine in the data is independent of each other
  • This assumption is met as knowing there is censored data does not provide any new information about the survival probability
  • This assumption is tested by looking at the Martingale residuals (Martingale Residuals = Observed Events - Expected Events) and determining if there is a slope of zero

  • Based on the below plots, this assumption is met.

  • This assumption is tested by looking at the scaled Schoenfeld partial residuals (the value of the covariate - the expected value of the covariate at the time of failure) and examining the p-value

  • Based on the below table and plots, this assumption is met

variable chisq df p
os1 4.293034 1 0.0382688
os2 2.188373 1 0.1390561
sm1 3.083761 1 0.0790775
sm2 1.092906 1 0.2958282
GLOBAL 32.955156 22 0.0625074

Model Results

  • The CPH model was determined to be a valid model and has a concordance index of {r} ci
  • Using the model, we can generate the coefficients and calculate the hazard rate, the cumulative hazard value, and the survival probability
id time status HazardRate SurvivalProbability
1 149 1 7.657873 0.8617456
2 269 1 2.585519 0.1719981
3 206 1 3.085786 0.4688056
4 235 1 1.488771 0.5305390
5 154 1 1.492792 0.9575895

Conclusion

  • Based on the model, NASA should not use their engines after 128 iterations have passed
Percent Time
100% Survival 128
75% Survival, 201
50% Survival 245
25% Survival 299
10% Survival 347

References

Abeysekera, W. W. M., and M. R. Sooriyarachchi. 2009. “Use of Schoenfeld’s Global Test to Test the Proportional Hazards Assumption in the Ox Proportional Hazards Model: An Application to a Clinical Study.” https://www.researchgate.net/publication/238483310_Use_of_Schoenfeld's_global_test_to_test_the_proportional_hazards_assumption_in_the_Cox_proportional_hazards_model_An_application_to_a_clinical_study.
Asghar, Naseem, Khalil Umair, and Iftikhar Uddin. 2024. “Mixture and Non-Mixture Cure Models for the Survival Analysis of SARS-CoV-2 Patients in Khyber Pakhtunkhwa, Pakistan.” Pakistan Journal of Medical Sciences 40 (8): 1841–46.
Bustan, M. Nadjib. 2018. “Cox Proportional Hazard Survival Analysis to Inpatient Breast Cancer Cases.”
Jiang, Nan, Wu Yongfa, and Chengjia Li. 2024. “Limitations of Using COX Proportional Hazards Model in Cardiovascular Research.” Cardiovascular Diabetology, no. 1: 1–2.
Klein, John P., and Melvin L. Moeschberger. 2005. Survival Analysis: Techniques for Censored and Truncated Data. 2nd ed. Springer.
Nahhas, Ramzi W. 2025. Introduction to Regression Methods for Public Health Using r. 1st ed. Chapman & Hall.
Saxena, Abhinav, Kai Goebel, Don Simon, and Neil Eklund. 2008. “Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation,” 1–9. https://doi.org/10.1109/PHM.2008.4711414.
Wang, Weiwei, Xiaotian Chang, and Feifei Lin. 2025. “Adding Salt to Foods and Risk of Incident Depression and Anxiety.” BMC Medicine, no. 1: 1–10.
Zhang, Yue, Yangyang Cheng, and Rodrigo M Carrillo-Larco. 2025. “Postpartum Depression in Relation to Chronic Diseases and Multimorbidity in Women’s Mid-Late Life: A Prospective Cohort Study of UK Biobank.” BMC Medicine 23 (1): 1–13.